Evaluation of Negentropy-based Cluster Validation Techniques in Problems with Increasing Dimensionality
نویسندگان
چکیده
The aim of a crisp cluster validity index is to quantify the quality of a given data partition. It allows to select the best partition out of a set of potential ones, and to determine the number of clusters. Recently, negentropy-based cluster validation has been introduced. This new approach seems to perform better than other state of the art techniques, and its computation is quite simple. However, like many other cluster validation approaches, it presents problems when some partition regions have a small number of points. Different heuristics have been proposed to cope with this problem. In this article we systematically analyze the performance of different negentropy-based validation approaches, including a new heuristic, in clustering problems of increasing dimensionality, and compare them to reference criteria such as AIC and BIC. Our results on synthetic data suggest that the newly proposed negentropy-based validation strategy can outperform AIC and BIC when the ratio of the number of points to the dimension is not high, which is a very common situation in most real applications.
منابع مشابه
Fuzzy Cluster Validation Using the Partition Negentropy Criterion
We introduce the Partition Negentropy Criterion (PNC) for cluster validation. It is a cluster validity index that rewards the average normality of the clusters, measured by means of the negentropy, and penalizes the overlap, measured by the partition entropy. The PNC is aimed at finding well separated clusters whose shape is approximately Gaussian. We use the new index to validate fuzzy partiti...
متن کاملThe effect of low number of points in clustering validation via the negentropy increment
We recently introduced the negentropy increment, a validity index for crisp clustering that quantifies the average normality of the clustering partitions using the negentropy. This index can satisfactorily deal with clusters with heterogeneous orientations, scales and densities. One of the main advantages of the index is the simplicity of its calculation, which only requires the computation of ...
متن کاملA heuristic method for combined optimization of layout design and cluster configuration in continuous productions
Facility layout problems have been generally solved either hierarchically or integrated into other phases of plant design. In this paper, a hybrid method is introduced so that clustering and facilities layout can be simultaneously optimized. Each cluster is formed by a group of connected facilities and selection of the most appropriate cluster configuration is aimed. Since exact method by MIP i...
متن کامل2D Dimensionality Reduction Methods without Loss
In this paper, several two-dimensional extensions of principal component analysis (PCA) and linear discriminant analysis (LDA) techniques has been applied in a lossless dimensionality reduction framework, for face recognition application. In this framework, the benefits of dimensionality reduction were used to improve the performance of its predictive model, which was a support vector machine (...
متن کاملDimensionality Reduction Evolution and Validation
In this paper, proposing visualized and quantitative evaluation methods for validation dimensionality reduction techniques performance. Four well known techniques for dimensionality reduction evaluated, verify the capacity of generating a lower dimensional chemical space with minimum information error. Real chemical database used to generate a sample with specific structure as an input to evolu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012